5 research outputs found

    Diversity and novelty in web search, recommender systems and data streams

    Full text link
    This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Proceedings of the 7th ACM international conference on Web search and data mining, http://dx.doi.org/10.1145/2556195.2556199This tutorial aims to provide a unifying account of current research on diversity and novelty in the domains of web search, recommender systems, and data stream processing

    Exploiting interclass rules for focused crawling

    Get PDF
    A focused crawler gathers relevant Web pages on a particular topic. This rule-based Web-crawling M i n i n g t h e W e b approach uses linkage statistics among topics to improve a baseline focused crawler’s harvest rate and coverage

    Case Western Reserve University

    No full text
    In this article, we discuss the issues involved in adding a native score management system to object-relational databases, to be used in querying Web metadata (that describes the semantic content of Web resources). The Web metadata model is based on topics (representing entities), relationships among topics (called metalinks), and importance scores (sideway values) of topics and metalinks. We extend database relations with scoring functions and importance scores. We add to SQL score-management clauses with well-defined semantics, and propose the sidewayvalue algebra (SVA), to evaluate the extended SQL queries. SQL extensions and the SVA algebra are illustrated through two Web resources, namely, the DBLP Bibliography and the SIGMOD Anthology. SQL extensions include clauses for propagating input tuple importance scores to output tuples during query processing, clauses that specify query stopping conditions, threshold predicates (a type of approximate similarity predicates for text comparisons), and user-defined-function-based predicates. The propagated importance scores are then used to rank and return a small numbe

    Topic-Centric Querying of Web Information Resources

    No full text
    This paper deals with the problem of modeling web information resources using expert knowledge and personalized user information, and querying them in terms of topics and topic relationships. We propose a model for web information resources, and a query language SQL-TC (Topic-Centric SQL) to query the model. The model is composed of web-based information resources (XML or HTML documents on the web), expert advice repositories (domain-expertspecified metadata for information resources), and personalized information about users (captured as user profiles, that indicate users' preferences as to which expert advice they would like to follow, and which to ignore, etc). The query language SQL-TC makes use of the metadata information provided in expert advice repositories and embedded in information resources, and employs user preferences to further refine the query output. Query output objects/tuples are ranked with respect to the (expert-judged and user-preference-revised) importance values of requested topics/metalinks, and the query output is limited by either top n-ranked objects/tuples, or objects/tuples with importance values above a given threshold, or both.
    corecore